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This tutorial surveys design methods for energy-efficient system -level design. We consider 
electronic sytems consisting of a hardware platform and software layers. We consider the 
three major constituents of hardware that consume energy, namely computation, 
communication, and storage units, and we review methods of reducing their energy 
consumption. We also study models for analyzing the energy cost of software, and methods 
for energy-efficient software design and compilation. This survery ... 
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We present a survey of the state-of-the-art techniques used in performing data and 
memory- related optimizations in embedded systems. The optimizations are targeted directly 
or indirectly at the memory subsystem, and impact one or more out of three important cost 
metrics: area, performance, and power dissipation of the resulting implementation. We first 
examine architecture-independent optimizations in the form of code transoformations. We 
next cover a broad spectrum of optimizati ... 

Keywords: DRAM, SRAM, address generation, allocation, architecture exploration, code 
transformation, data cache, data optimization, high-level synthesis, memory architecture 
customization, memory power dissipation, register file, size estimation, survey 
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This paper describes OneChip, a third generation reconfigurable processor architecture that 
integrates a Reconfigurable Functional Unit (RFU) into a superscalar Reduced Instruction Set 
Computer (RISC) processor's pipeline. The architecture allows dynamic scheduling and 
dynamic reconfiguration. It also provides support for pre-loading configurations and for 
Least Recently Used (LRU) configuration management.To evaluate the performance of the 
OneChip architecture, several off-the-s ... 

Keywords: OneChip, reconfigurable processors, superscalar processors 
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As the gap between processor and memory speeds continues to widen, methods for 
evaluating memory system designs before they are implemented in hardware are becoming 
increasingly important. One such method, trace-driven memory simulation, has been the 
subject of intense interest among researchers and has, as a result, enjoyed rapid 
development and substantial improvements during the past decade. This article surveys and 
analyzes these developments by establishing criteria for evaluating trac ... 

Keywords: TLBs, caches, memory management, memory simulation, trace-driven 
simulation 
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Dynamically dispatched calls often limit the performance of object-oriented programs, since 
opject-oriented programming encourages factoring code into small, reusable units, thereby 
increasing the frequency of these expensive operations. Frequent calls not only slow down 
execution with the dispatch overhead per se, but more importantly they hinder optimization 
by limiting the range and effectiveness of standard global optimizations. In particular, 
dynamically dispatched calles prevent stand ... 

Keywords: adaptive optimization, pause clustering, profile-based optimization, run-time 
compilation, type feedback 
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We address the problem of code-size minimization in VLSI systems with embedded DSP 
processors. Reducing code size reduces the production cost of embedded systemswe use 
data-compression methods to develop code-size minimization strategies. In our framework, 
the compressed program consists of a skeleton and a dictionary. We show that the 
dictionary can be computed by solving a set-covering problem derived from the original 
program. To execute the compressed code, we describe two me ... 

Keywords: code size optimization, compression 
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The issue logic of a dynamically-scheduled superscalar processor is a complex mechanism 
devoted to start the execution of multiple instructions every cycle. Due to its complexity, it 
is responsible for a significant percentage of the energy consumed by a microprocessor. The 
energy consumption of the issue logic depends on several architectural parameters, the 
instruction issue queue size being one of the most important In this paper we present a 
technique to reduce the energy consumption ... 

Keywords: adaptive hardware, energy consumption, issue logic, low power 
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Smaller feature sizes, reduced voltage levels, higher transistor counts, and reduced noise 
margins make future generations of microprocessors increasingly prone to transient 
hardware faults. Most commercial fault-tolerant computers use fully replicated hardware 
components to detect microprocessor faults. The components are lockstepped (cycle-by- 
cycle synchronized) to ensure that, in each cycle, they perform the same operation on the 
same inputs, producing the same outputs in the abs ... 
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While much current research concerns multiprocessor design, few traces of parallel 
programs are available for analyzing the effect of design trade-offs. Existing trace collection 
methods have serious drawbacks: trap-driven methods often slow down program execution 
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by more than 1000 times, significantly perturbing program behavior; microcode modification 
is faster, but the technique is neither general nor portable. This paper describes a new tool, 
called MPTRACE, for collecting tr ... 
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Richard P. Gabriel, Larry M. Masinter 

August 1982 Proceedings of the 1982 ACM symposium on LISP and functional 
programming 
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This paper describes the issues involved in evaluating the performance of Lisp systems. We 
explore the various levels at which quantitative statements can be made about the 
performance of a Lisp system, giving examples from existing implementations wherever 
possible. Our thesis is that benchmarking is most effective when performed in conjunction 
with an analysis of the underlying Lisp implementation and computer architecture. We 
examine some simple benchmarks which have been used to measure ... 
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Mehrdad Reshadi, Prabhat Mishra, Nikil Dutt 

June 2003 Proceedings of the 40th conference on Design automation 

Full text available; '| |sdf{19S.91 KB) Additional Information: full citation , abstract, references, index terms 

Instruction set simulators are critical tools for the exploration and validation of new 
programmable architectures. Due to increasing complexity of the architectures and time-to- 
market pressure, performance is the most important feature of an instruction-set simulator. 
Interpretive simulators are flexible but slow, whereas compiled simulators deliver speed at 
the cost of flexibility. This paper presents a novel technique for generation of fast instruction 
set simulators that combines the benefit ... 

Keywords: compiled simulation, instruction abstraction, instruction set architectures, 
interpretive simulation 
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As far as the future of communication is concerned, we have seen that there is great 
demand for audio and video data to complement text. Digital signal processing (DSP) is the 
science that enables traditionally analog audio and video signals to be processed digitally for 
transmission, storage, reproduction and manipulation. In this paper, we will explain the 
various DSP architectures and its silicon implementation. We will also discuss the state-of- 
the art and examine the issues pertaining to pe ... 

Jamison D. Coliins, Hong Wang, Dean M. Tullsen, Christopher Hughes, Yong-Fong Lee, Dan 
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This paper explores Speculative Precomputation, a technique that uses idle thread context in 
a multithreaded architecture to improve performance of single-threaded applications. It 
attacks program stalls from data cache misses by pre-computing future memory accesses in 
available thread contexts, and prefetching these data. This technique is evaluated by 
simulating the performance of a research processor based on the Itanium ™ ISA supporting 
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This paper describes the design and implementation of a method for producing compact, 
bytecoded instruction sets and interpreters for them. It accepts a grammar for programs 
written using a simple bytecoded stack-based instruction set, as well as a training set of 
sample programs. The system transforms the grammar, creating an expanded grammar 
that represents the same language as the original grammar, but permits a shorter 
derivation of the sample programs and others like them. A program's de ... 

Keywords: bytecode interpretation, context-free grammars, program compression, 
variable-to-fixed length codes 
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We describe the Slice Processor micro-architecture that implements a generalized operation- 
based prefetching mechanism. Operation-based prefetchers predict the series of operations, 
or the computation slice that can be used to calculate forthcoming memory references. This 
is in contrast to outcome-based predictors that exploit regularities in the (address) outcome 
stream. Slice processors are a generalization of existing operation-based prefetching 
mechanisms such as stream buffers where the ... 
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Control intensive scalar programs pose a very different challenge to highly pipelined 
supercomputers than vectorizable numeric applications. Function call/return and branch 
instructions disrupt the flow of instructions through the pipeline, degrading the utilization of 
the pipelined datapaths. This paper describes control flow optimization for scalar processing 
using an optimizing compiler. To obtain program control flow information, a system 
independent profiler has been integrated into th ... 
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State of the art Java Virtual Machines with Just-In-Time (JIT) compilers make use of 
advanced compiler techniques, run-time profiling and adaptive compilation to improve 
performance. However, these techniques for alleviating performance bottlenecks are more 
effective in long running workloads, such as server applications. Short running Java 
programs, or client workloads, spend a large fraction of their execution time in compilation 
instead of useful execution when run using JIT compilers. In ... 

Results 1 - 20 of 200 Result page: 1 2345678910 next 

The ACM Portal is published by the Association for Computing Machinery. Copyright © 2004 ACM, Inc. 

Terms of Usage Privacy Policy Code of Ethics Contact Us 

Useful downloads: S Adobe Acrobat ^.QuickTime ^.Mndoy/sjyedia.Bave.r ^\ReaJ.£!aXvI 



c g e cf c 



